Skip to content

[pick](branch-4.1)pick 62947 63055 63070 to 4.1#63297

Open
Mryange wants to merge 3 commits into
apache:branch-4.1from
Mryange:branch-4.1-pick-62947
Open

[pick](branch-4.1)pick 62947 63055 63070 to 4.1#63297
Mryange wants to merge 3 commits into
apache:branch-4.1from
Mryange:branch-4.1-pick-62947

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented May 15, 2026

What problem does this PR solve?

#62947
#63055
#63070

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Mryange added 3 commits May 15, 2026 17:15
…s to DataQueue (apache#62947)

The DataQueue implementation used parallel vectors (one per child queue
—
lock, blocks, counters, etc.) scattered across the class, making it hard
to
reason about which lock protects which field. The old INJECT_MOCK_SLEEP
pattern injected test randomness through a macro wrapping
std::lock_guard,
but this was fragile and didn't compose with Clang's -Wthread-safety
static
analysis. Several methods (e.g., set_source_block) performed lock-free
  checks followed by locked re-checks, risking subtle races.

Root cause: per-child state was not encapsulated, lock/field
relationships
  were implicit, and no static analysis guarded them.
This PR:

- Introduces a SubQueue struct that groups all per-child state (queue
lock
+ blocks, free lock + free blocks, counters, sink dependency pointer)
with
explicit GUARDED_BY annotations.
thread-safety macros, an AnnotatedMutex wrapper, and a LockGuard that
replaces std::lock_guard. In BE_TEST builds, LockGuard injects random
sleep
before lock acquisition and after release to exercise concurrent code
paths — replacing the old INJECT_MOCK_SLEEP macro.
  - Enables -Wthread-safety in Clang builds.
  - Moves dependency notifications (set_ready, block, set_always_ready)
outside the queue lock in try_pop/try_push/clear_blocks to avoid nested
lock ordering issues.
- Fixes set_source_block to always hold _source_lock when reading
_source_dependency, eliminating the lock-free pre-check.
- Adds 20+ new unit tests covering SubQueue methods and DataQueue edge
  cases (empty pop, push-after-finished, capacity blocking, finish
idempotency, clear_blocks, low-memory mode, terminate, free-block reuse,
child_idx routing).

(cherry picked from commit efd7067)
…63055)

### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
`DataQueueTest.MultiTest` could intermittently hang after DataQueue
moved sink dependency notifications outside the per-sub-queue lock. Root
cause: `SubQueue` queue state and `sink_dependency` state were no longer
serialized by `queue_lock`, so a producer could observe its sink
dependency as blocked even after the queue had already become empty,
leaving no future push/pop to wake it. This patch updates
`sink_dependency->set_ready()` and `sink_dependency->block()` while
holding `queue_lock`, keeping queue occupancy and sink readiness
transitions atomic with respect to each other.

Related PR: apache#62947

(cherry picked from commit 17bbba4)
…ated wrappers for thread safety analysis (apache#63070)

This PR introduces Clang thread safety annotations (-Wthread-safety) to
pipeline operator shared states by
replacing raw std::mutex/std::lock_guard/std::unique_lock with annotated
wrappers (AnnotatedMutex, LockGuard,
UniqueLock), and by decorating guarded member variables with GUARDED_BY
attributes. This enables the compiler to
statically detect data races where a field is accessed without holding
its associated mutex.

The change also fixes two bugs uncovered during the annotation process:

- MultiCastDataStreamer::push: _eos (member) was checked instead of eos
(parameter), causing the "set always ready"
branch to fire on the prior call's stale state rather than the current
one.
- MultiCastDataStreamer::pull's spill lambda:
_cached_blocks[sender_idx].empty() was checked outside the mutex; the
   check is now done via a boolean captured inside the lock.

(cherry picked from commit 2dc0d0a)
@Mryange Mryange requested a review from yiguolei as a code owner May 15, 2026 09:23
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 15, 2026

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants